Hybrid machine learning-based systems and methods for training an object picking robot with real and simulated performance data

ABSTRACT

For training an object picking robot with real and simulated grasp performance data, grasp locations on an object are assigned based on object physical properties. A simulation experiment for robot grasping is performed using a first set of assigned locations. Based on simulation data from the simulation, a simulated object grasp quality of the robot is evaluated for each of the assigned locations. A first set of candidate grasp locations on the object is determined based on data representative of simulated grasp quality from the evaluation. Based on sensor data from an actual experiment for the robot grasping using each of the candidate grasp locations, an actual object grasp quality is evaluated for each of the candidate locations.

TECHNICAL FIELD

The present disclosure relates to robotic grasping of objects, and, moreparticularly, to systems and methods for machine learning-based trainingof object picking robots.

BACKGROUND

In at least some known systems and methods for machine learning-basedrobot training, limited data sources are utilized, which may reduce theefficiency, speed, and accuracy of resultant machine learning models.Known systems and methods may only utilize data from real-world robotpicking experiments. Likewise, known robot training systems and methodsmay utilize only synthetic data generated in virtual environments basedon object physical properties. In any event, a lack of cooperativeutilization of numerous sources of robot grasping performance data fromactual and simulated experiments and in both training and run timemachine learning environments may lead to lost opportunities forefficiency gains for object picking robot training and for the variousindustrial processes for which such systems are applied.

SUMMARY

The systems and methods for hybrid machine learning (ML)-based trainingof object picking robots with real and simulated grasp performance datadisclosed herein present a new and improved processing pipeline forsolving robot picking problems. A hybrid machine learning engine is usedto train the robot to learn the grasp location in both simulated andreal-world environments. A sensor feedback system fuses multiple sensorsignals to evaluate the grasp quality. The disclosed systems and methodsare further capable of run time/in-line self-correction and fine-tuningthe robot grasp by online learning.

In one aspect, a method for training an object picking robot with realand simulated grasp performance data is provided. The method includesassigning a plurality of grasp locations on an object based on known orestimated physical properties of the object. The method includesperforming a first simulation experiment for the robot grasping theobject using a first set of the plurality of assigned grasp locations.The method includes evaluating, based on a first set of simulation datafrom the first simulation experiment, a simulated object grasp qualityof the robot grasping for each of the first set of assigned grasplocations. The method includes determining a first set of candidategrasp locations on the object based on data representative of asimulated grasp quality obtained in the evaluating step for thesimulated object grasp quality. The method includes evaluating, based ona first set of grasp quality sensor data from a first actual experimentfor the robot grasping the object using each of the first set ofcandidate grasp locations, an actual object grasp quality of the robotgrasping for the each of the first set of candidate grasp locations. Themethod includes, for the each of the first set of candidate grasplocations, determining a convergence of the actual object grasp qualityand the simulated object grasp quality based on: the first set of graspquality sensor data from the first actual experiment, and the datarepresentative of the simulated grasp quality.

In another aspect, a system for training an object picking robot withreal and simulated grasp performance data is provided. The systemincludes one or more memory devices, and one or more processors incommunication with the one or more memory devices and the object pickingrobot. The one or more processors are programmed to assign a pluralityof grasp locations on an object based on known or estimated physicalproperties of the object. The one or more processors are programmed toperform a first simulation experiment for the robot grasping the objectusing a first set of the plurality of assigned grasp locations. The oneor more processors are programmed to evaluate, based on a first set ofsimulation data from the first simulation experiment, a simulated objectgrasp quality of the robot grasping for each of the first set ofassigned grasp locations. The one or more processors are programmed todetermine a first set of candidate grasp locations on the object basedon data representative of a simulated grasp quality obtained inevaluating the simulated grasp quality. The one or more processors areprogrammed to evaluate, based on a first set of sensor data from a firstactual experiment for the robot grasping the object using each of thefirst set of candidate grasp locations, an actual object grasp qualityof the robot grasping for the each of the first set of candidate grasplocations.

In yet another aspect, a non-transitory computer-readable storage mediumstoring processor-executable instructions for training an object pickingrobot with real and simulated grasp performance data is provided. Whenexecuted by one or more processors, the processor-executableinstructions cause the one or more processors to: (a) assign a pluralityof grasp locations on an object based on known or estimated physicalproperties of the object; (b) perform a first simulation experiment forthe robot grasping the object using a first set of the plurality ofassigned grasp locations; (c) evaluate, based on a first set ofsimulation data from the first simulation experiment, a simulated objectgrasp quality of the robot grasping for each of the first set ofassigned grasp locations; (d) determine a first set of candidate grasplocations on the object based on data representative of a simulatedgrasp quality obtained in in evaluating the simulated grasp quality; and(e) evaluate, based on a first set of grasp quality sensor data from thefirst actual experiment, an actual object grasp quality of the robotgrasping for the each of the first set of candidate grasp locations.

The systems and methods for hybrid ML-based training of object pickingrobots with real and simulated grasp performance data disclosed hereinprovides users a number of beneficial technical effects and realizesvarious advantages as compared to known systems and methods. Suchbenefits include, without limitation, enabling picking and placing ofobject(s) with enhanced accuracy, speed, efficiency, and reduced errorrates as compared to known processes. Utilizing the disclosed systemsand methods for hybrid ML-based training of object picking robotsthereby results in a reduction in the required number of per object CPUclock cycles needed for processor(s) in both training and run timeenvironments. The systems and methods for hybrid ML-based training ofobject picking robots described herein enable continuous evaluation andmonitoring of object grasping performance so that the hybrid MLoperations may be implemented for fine-tuning and enhancing the accuracyand robustness of ML models, as needed, including across numerous robotsinvolved in unit operations. As such, the disclosed systems and methodsfor hybrid ML-based training of object picking robots enable efficientand effective training of object picking robots in a wide variety ofindustrial applications where improved utilization of computing, memory,network bandwidth, electric power, and/or human personnel resources isdesirable.

Further and alternative aspects and features of the disclosed principleswill be appreciated from the following detailed description and theaccompanying drawings. As will be appreciated, the principles related tothe disclosed systems and methods for hybrid machine learning-basedtraining of object picking robots with real and simulated graspperformance data are capable of being carried out in other and differentembodiments, and capable of being modified in various respects.Accordingly, it is to be understood that both the foregoing summary andthe following detailed description are exemplary and explanatory onlyand do not restrict the scope of the appended claims.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a schematic diagram of a system for hybrid machinelearning-based training of object picking robots with real and simulatedgrasp performance data according to an embodiment of the disclosure.

FIG. 2 is a flow chart of a method for hybrid machine learning-basedtraining of object picking robots with real and simulated graspperformance data using the system shown in FIG. 1 according to anembodiment of the disclosure.

FIG. 3 is a schematic diagram of an object positioned in a simulated oran actual workspace of a robot of the system shown in FIG. 1 accordingto an embodiment of the disclosure.

FIG. 4 is a schematic diagram of a process for hybrid machinelearning-based training of object picking robots with real and simulatedgrasp performance data using the system shown in FIG. 1 and the methodshown in FIG. 2 according to an embodiment of the disclosure.

FIG. 5 is a block diagram of a data structure of the memory shown in thesystem of FIG. 1 according to an embodiment of the disclosure.

FIG. 6 is a block diagram of a software architecture for the methodshown in FIG. 2 according to an embodiment of the disclosure.

FIG. 7 is a flow chart of aspects of the method shown in FIG. 2according to embodiments of the disclosure.

DETAILED DESCRIPTION

Reference will now be made in detail to specific embodiments orfeatures, examples of which are illustrated in the accompanyingdrawings. Wherever possible, corresponding or similar reference numberswill be used throughout the drawings to refer to the same orcorresponding parts. Moreover, references to various elements describedherein, are made collectively or individually when there may be morethan one element of the same type. However, such references are merelyexemplary in nature. It may be noted that any reference to elements inthe singular may also be construed to relate to the plural andvice-versa without limiting the scope of the disclosure to the exactnumber or type of such elements unless set forth explicitly in theappended claims.

FIG. 1 is a schematic diagram of a system (1) for hybrid machinelearning (ML)-based training of object picking robots (2) with real andsimulated grasp performance data according to an embodiment of thedisclosure. FIG. 2 is a flow chart of a method (200) for hybrid ML-basedtraining of object picking robots (2) with real and simulated graspperformance data according to an embodiment of the disclosure. In theillustrated examples, the method (200) shown in FIG. 2 is implemented,at least in part, using the system (1) of FIG. 1.

Referring to FIG. 1, system (1) includes one or more memory devices (5),also collectively referred to herein as memory (5). Memory (5) includesan ML model (68) stored therein. ML model (68) may be stored in memory(5), at least in part, as a database. System (1) includes one or moreprocessors (3) in communication with memory (5). System (1) includes anobject picking robot (2) including at least one robotic arm (12) incommunication with processor(s) (3). Object picking robot (2) is alsocollectively referred to herein as robot (2). Processor(s) (3) includeat least one transceiver (55) in communication with one or moretransceivers (54) of robot (2). In an example, memory (5) is also incommunication with robot (2). Robot (2) includes at least one objectgripper device (30) operably coupled to robotic arm(s) (12). Gripperdevice(s) (30) is/are in communication with processor(s) (3). In theembodiment shown in FIG. 1, system (1) includes a plurality of robots(71). As used herein, “operably coupled” refers to two or morefunctionally-related components being coupled to one another forpurposes of cooperative mechanical movement(s) and/or for purposes offlow of electric current and/or flow of data signals. Where thisoperative coupling of two or more components is for purposes of dataflow, the two or more components may be operably coupled via a wiredconnection and/or via a wireless connection. The two or more componentsthat are so coupled via the wired and/or wireless connection(s) may beproximate one another (e.g., a first component being in the same room orin the same component housing as a second component) or they may beseparated by some distance in physical space (e.g., the first componentbeing in a different building from the location of the secondcomponent).

System (1) includes one or more sensors (22) in communication withprocessor(s) (3). Sensor(s) (22) is/are positioned on and/or in roboticarm(s) (12) and/or gripper(s) (30) and operably coupled thereto.Sensor(s) (22) may be positioned and/or mounted on a structural frame ofthe robot (2) (e.g., stationary cameras affixed to a robot cell, abooth, and/or a safety cage, not shown in FIG. 1), either instead of, orin addition to, being positioned on and/or in robotic arm(s) (12) and/orgripper(s) (30). In an example, sensor(s) (22) is/are, or include, graspquality sensor(s) (22). Grasp quality sensor(s) (22) is/are alsocollectively referred to herein as sensor(s) (22). Sensor(s) (22)include one or more of: multi-degree of freedom (DOF) force-torquesensor(s), camera(s) for monitoring and collecting visual data duringrobotic manipulation tasks such as grasping of object(s) (11), multi-DOFinertial measurement sensor(s) for measuring dynamic aspects of roboticobject (11) handling, and directional and/or rotational motion sensor(s)(e.g., motor encoder(s), camera(s), and/or tactile sensor(s)) formonitoring and/or detecting relative motion between the object (11) andthe gripper (30). In an example, multi-DOF force-torque sensor(s) and/ormulti-DOF inertial measurement sensor(s) are mounted and/or otherwiseoperably coupled to and/or between the robotic arm(s) (12) and thegripper(s) (30) for measuring a static and/or dynamic status ofobject(s) (11) with respect to the gripper(s) (30) and a grasp locationon the object(s) (11). Likewise, in the example, sensor(s) (22) mayinclude fixed (to ground) grasp quality sensor(s) to facilitate themeasurement of static and/or dynamic status of object(s) (11).

System (1) includes a training ML environment (74) and a run time MLenvironment (80) for robot (2). Training (74) and run time (134) MLenvironments include at least one picking location (10) and at least oneplacement location (42). In an example, training ML (74) and/or run timeML (80) environments is/are simulated computing and robot (2)manipulation environment(s). In another example, training (74) and/orrun time (80) ML environments is/are actual (e.g., “real world”)computing and robot (2) manipulation environment(s). In yet anotherexample, training ML (74) and/or run time (80) ML environments is/areboth simulated and actual computing and robot (2) manipulationenvironment(s). One or more objects (11) are delivered to and/orotherwise arrive at picking location(s) (10) for simulated and/or actualmanipulation by system (1) including robot(s) (2). In the embodimentshown in FIG. 1, a plurality of object(s) (11) (e.g., a first object(82) and at least a second object (94)) are present in training (74)and/or run time (80) ML environment(s), with each of the first (82) andthe at least a second (94) objects having a plurality of physicalcharacteristics such as a shape, material(s) of construction, a weight,a density, a center of mass, a mass distribution, a height, a width,and/or a length. The plurality of object (11) physical characteristicsmay additionally include an object (11) identification and imagesdepicting the appearance(s) of object(s) (11) in various poses.

Processor(s) (3) may be located in training (74) and/or run time (80) MLenvironment(s). Processor(s) (3) may be located remote from training(74) and/or run time (80) ML environment(s). Processor(s) (3) may becollocated with robot (2). Processor(s) (3) are programmed to implementand/or otherwise perform, at least in part, one or more of the disclosedsteps of method (200) (shown in FIG. 2), including, without limitation,using system (1). Processor(s) (3) are capable of carrying out multiplefunctions in system (1). Processor(s) (3) include robotic arm (12)control functionality and gripper (30) control functionality.Processor(s) (3) include ML functionality. In an example, MLfunctionality of processor(s) (3) is implemented and/or otherwiseperformed, at least in part, in system (1) using one or more artificialintelligence and/or deep learning computing and processing scheme(s).

In an example, memory device(s) (5) include at least one non-transientcomputer-readable medium (600). Non-transient computer-readable medium(600) stores as software (604) processor (3)-executable instructions fortraining robot(s) (2) with real (e.g., actual, “real-world”) andsimulated grasp performance data, including, without limitation, insystem (1). In an example, processor (3)-executable instructions storedas software (604) include one or more software modules (608). Whenexecuted by the processor(s) (3) that are in communication with memory(5), robotic arm(s) (12), gripper(s) (30), and sensor(s) (22), theprocessor (3)-executable instructions cause the processor(s) (3) toimplement and/or otherwise perform, at least in part, one or more of thedisclosed steps of method (200), including, without limitation, usingsystem (1).

In system (1), processor(s) (3), memory (5), robotic arm(s) (12),gripper(s) (30), and sensor(s) (22) are in communication with oneanother via, and communicate with one another using signals (e.g.,encoded data signals) sent and/or received through, a network (52).Communication among and between processor(s) (3), memory (5), roboticarm(s) (12), gripper(s) (30), and sensor(s) (22) is facilitated bytransceivers (54, 55). In an example, system (1) communication usingnetwork (52) includes wireless communication equipment and protocols. Inanother example, system (1) communication using network (52) includeswired communication equipment and protocols. In yet another example,system (1) communication using network (52) includes a combination ofwireless and wired communication equipment and protocols. In an example,system (1) communication includes wireless and/or wired communicationequipment and protocols for utilizing cloud-based processing, datastorage, and/or communication resources. In an example, system (1)communication utilizes the Internet, including, without limitation,Internet of Things (IoT) protocols, practices, and/or standards.

FIG. 3 is a schematic diagram of object (11) positioned in a pluralityof poses (67) in a simulated or an actual workspace (75) of the robot(2) of the system (1) shown in FIG. 1 according to an embodiment of thedisclosure. FIG. 4 is a schematic diagram of a process (300) for hybridML-based training of robot (2) with real and simulated grasp performancedata using system (1) and method (200) according to an embodiment of thedisclosure.

Referring to FIG. 3, the object (11) has known or estimated physicalproperties (6) including a center of mass (COM) (70) (e.g., denoted by“x” in FIG. 3). COM (70) may be known, or its location may be assignedto object (11) based on other known or estimated physical properties (6)such as mass, dimensions, material, and distribution of the mass overthe volume of the object (11). As further described herein, a pluralityof grasp locations (8) (e.g., eight locations, denoted by asterisks “*”in FIG. 3) are assigned on the object (11). The object may be present inthe training (74) and/or the run time (80) ML environments in aplurality of different poses (e.g., first (67 a), second (67 b), andthird (67 c) poses). In the embodiment, the assigned grasp locations (8)on the object (11) do not vary depending on the particular pose(s) (67)in which the object (11) in encountered by system (1). As describedherein, the disclosed system (1) and method (200) includes determiningcandidate grasp locations (20) from among the plurality of assignedgrasp locations (8). In the example shown in FIG. 3, two of the eightassigned grasp locations (8) are determined as candidate grasp locations(20) (e.g., denoted by circled asterisks in FIG. 3).

Referring to FIGS. 1-4, process (300) is used for hybrid ML-basedtraining of robot(s) (2) performing for picking of a particular object(11). In an example, object (11) may be a family of objects (11) havingthe same or similar physical properties (6). Process (300) may beimplemented for robot(s) (2) separately for unique objects (11) havingdissimilar physical properties (6). In the embodiment, process (300)commences with block (301). In block (301), process (300) trains therobot (2) in a simulator. From block (301), process (300) proceeds to ablock (304). In block (304), process (300) determines candidate grasplocations (20) from among the plurality of assigned grasp locations (8).From block (304), process (300) proceeds to a block (307). In block(307), a simulation and actual experiments for robot (2) are performedusing the determined candidate grasp locations (20) on the object (10).

From block (307), process (300) proceeds to a block (310). In block(310), process (300) obtains data from grasp quality sensor(s) (22) forfeedback with respect to grasp performance of the robot (2) grasping theobject (11). From block (310), process (300) proceeds to a block (313).In block (313), process (300) evaluates the grasp quality of the object(11) by the robot (2) based on the feedback data from the sensor(s)(22). From block (313), process (300) proceeds to a logic decision(316). In logic decision (316), process (300) determines an existence ofa convergence of the actual object (11) grasp quality (e.g., determinedfrom actual robot action in an actual experiment of block (307)) and asimulated object (11) grasp quality (e.g., determined from simulatedrobot action in a simulated experiment of block (307)). As used herein,“convergence” means the computed grasp gets closer and closer to theground truth object grasp location as the algorithm iterations proceed.It is opposite to “diverge”, where its output will undergo larger andlarger oscillations, never approaching a useful result.

If, based on the result of logic decision (316), process (300)determines there is a lack of the convergence, then process (300)proceeds to a block (319). In block (319), process (300) converts thegrasp quality evaluation from block (313) into a measure of graspsuccess probability. Upon completion of block (319), process (300)proceeds back to block (301) and process (300) imposes the grasp successprobability onto the simulator for retraining the robot (2) for graspingthe object (11). In an embodiment, blocks (301), (304), (307), (310),(313) and (319), and logic decision (316) of process (300) proceed inthe training ML environment (74).

For a result of logic decision (316) indicating a presence of theconvergence, process (300) proceeds to a block (322) instead ofproceeding to block (319). In block (322), process (300) implements runtime performance of robot (2) actions for picking of object(s) (11)using at least one of the candidate grasp locations (20) determined inblock (304). In an embodiment, block (322) also includes, at least inpart, the above-described functionality of block (310). From block(322), process (300) proceeds a block (325). In block (325), process(300) monitors the grasp quality of the robot (2) grasping the object(11). In process (300), the monitoring in block (325) includes, at leastin part, the functionality of, or process (300) proceeds to, a block(328). In block (328), process (300) implements a hybrid ML engine (51)for facilitating continuous monitoring and evaluation of object (11)grasp quality across the training (74) and run time (80) ML environmentsand, in applicable cases, across a plurality of robots (71). As furtherdescribed herein, the monitoring and hybrid ML engine (51) implementedin block (325) and in block (328), respectively, facilitatedetermination of a need for retraining of robot (2) grasping object (11)based on the run time object (11) grasping performance and the graspquality thereof. Thus, for example, in cases where, in block (325)and/or block (328), process (300) determines a need for retraining robot(2) for object (11) grasping, process (300) proceeds from block (325)and/or block (328) to block (301). In an embodiment, blocks (322), (325)and (328) of process (300) proceed in the run time ML environment (80).

FIG. 5 is a block diagram of a data structure (502) of the memory (5) ofsystem (1) according to an embodiment of the disclosure. FIG. 6 is ablock diagram of a software architecture for method (200) according toan embodiment of the disclosure. Referring to FIGS. 1-6, method (200)includes assigning (203) and storing (206), by processor(s) (3) and inmemory (5), respectively, the plurality of grasp locations (8) on theobject (11) based on known or estimated physical properties (6) of theobject (11). In an example, the assigning (203) and/or storing (206)facilitate setting up simulation and/or actual experiment(s) for therobot (2) and the object (11) to be grasped. In an example, theplurality of grasp locations (8) on the object (11) are assigned (203)and/or stored (206) by, or at the direction of, at least one user (60)of system (1). In another example, the plurality of grasp locations (8)on the object (11) are assigned (203) and/or stored (206) by, or at thedirection of, at least one administrator (48) of system (1). In yetanother example, the plurality of grasp locations (8) on the object (11)are assigned (203) and/or stored (206) by, or at the direction of,processor(s) (3). In embodiments for which the assigning (203) and/orstoring (206) step(s) is/are implemented and/or otherwise facilitated bysoftware (604), processor(s) (3) execute processor (3)-executableinstructions stored in an assigning (603) and/or storing (606)module(s). In an example, the plurality of grasp locations (8) on theobject (11) are stored in, and read from, data structure (502) and/orelsewhere in memory (5) by, or at the direction of, processor(s) (3).

Method (200) includes performing (209), by processor(s) (3), a firstsimulation experiment (212) for the robot (2) grasping (215) the object(11) using a first set (14) of the plurality of assigned (203) grasplocations (8). In an example, the first simulation experiment (212) isperformed (209) in the training ML environment (74). In embodiments forwhich the performing (209) step is implemented and/or otherwisefacilitated by software (604), processor(s) (3) execute processor(3)-executable instructions stored in a performing module (609). In anexample, the first set (14) of the plurality of assigned (203) grasplocations (8) is stored in, and read from, data structure (502) and/orelsewhere in memory (5) by, or at the direction of, processor(s) (3).

Method (200) includes evaluating (218), by processor(s) (3), a simulatedobject (11) grasp quality of the robot (2) grasping (215) for each ofthe first set (14) of assigned (203) grasp locations (8). In theembodiment, the simulated object (11) grasp quality is evaluated (218)based on a first set (15) of simulation data (17) from the firstsimulation experiment (212). The evaluating (218) step therebyfacilitates adjudging grasping (215) performance by robot (2) in thefirst simulation experiment (212). In embodiments for which theevaluating (218) step is implemented and/or otherwise facilitated bysoftware (604), processor(s) (3) execute processor (3)-executableinstructions stored in an evaluating module (618). In an example, thefirst set (15) of simulation data (17) from the first simulationexperiment (212) is stored in, and read from, data structure (502)and/or elsewhere in memory (5) by, or at the direction of, processor(s)(3).

Method (200) includes determining (221), by processor(s) (3), a firstset (16) of candidate grasp locations (20) on the object (11). In theembodiment, the first set (16) of candidate grasp locations (20) on theobject (11) are determined (221) based on data (18) representative of asimulated grasp quality obtained (224) in the evaluating (218) step. Theevaluating (218) step thereby facilitates generating a list of candidategrasp locations to be used for actual experiments based on the evaluated(218) and adjudged grasping (215) performance of robot (2) in the firstsimulation experiment (212). The first set (16) of candidate grasplocations (20) on the object (11) is thus selected from the assigned(203) plurality of grasp locations (8) on the object (11). Inembodiments for which the determining (221) step is implemented and/orotherwise facilitated by software (604), processor(s) (3) executeprocessor (3)-executable instructions stored in a determining module(621). In an example, the first set (16) of candidate grasp locations(20) on the object (11) is stored in, and read from, data structure(502) and/or elsewhere in memory (5) by, or at the direction of,processor(s) (3).

Method (200) may include performing (227), by processor(s) (3) and robot(2), a first actual experiment (230) for the robot (2) grasping (215)the object (11) using each of the determined (221) first set (16) ofcandidate grasp locations (20). In an example, the first actualexperiment (230) is performed (227) in the run time ML environment (80).In embodiments for which the performing (227) step is implemented and/orotherwise facilitated by software (604), processor(s) (3) executeprocessor (3)-executable instructions stored in a performing module(627). It is to be understood, however, that steps of the disclosedmethod 2 (e.g., the performing step (227)) that are described as beingperformed by robot (2) in conjunction with processor (s) (3) may beperformed without such close cooperation between robot (2) andprocessor(s) (3). For instance, processor(s) (3) performing thedisclosed method (2) may receive data for processing from sensor(s) (22)that are not necessarily coupled or mounted to robot (2), butnevertheless provide (e.g., transmit) sensor (8) data to processor(s)(3) for use in the disclosed method (2). Such cases exemplify theflexibility of the disclosed systems, methods, and software in that theyare beneficially applicable to a wide variety of robots, gripperdevices, and sensors.

Method (200) includes evaluating (233), by processor(s) (3), an actualobject (11) grasp quality of the robot (2) grasping (215) for each ofthe determined (221) first set (16) of candidate grasp locations (20).In the embodiment, the actual object (11) grasp quality of the robot (2)grasping (215) for each of the determined (221) first set (16) ofcandidate grasp locations (20) is evaluated (233) based on a first set(19) of grasp quality sensor (22) data (26) from the first actualexperiment (230). The evaluating (233) step thereby facilitatesadjudging grasping (215) performance by robot (2) in the first actualexperiment (230). In embodiments for which the evaluating (233) step isimplemented and/or otherwise facilitated by software (604), processor(s)(3) execute processor (3)-executable instructions stored in anevaluating module (633). In an example, the first set (19) of graspquality sensor (22) data (26) from the first actual experiment (230) isstored in, and read from, data structure (502) and/or elsewhere inmemory (5) by, or at the direction of, processor(s) (3).

Method (200) includes determining (236), by processor(s) (3), aconvergence of the actual object (11) grasp quality and the simulatedobject (11) grasp quality. In the embodiment, the convergence isdetermined (236) based on: the first set (19) of grasp quality sensor(22) data (26) from the first actual experiment (230), and the data (18)representative of the simulated grasp quality. In an example, thedetermining (236) step includes determining (236) the convergence of theactual object (11) grasp quality and the simulated object (11) graspquality for each of the determined (221) first set (16) of candidategrasp locations (20). In embodiments for which the determining (236)step is implemented and/or otherwise facilitated by software (604),processor(s) (3) execute processor (3)-executable instructions stored ina determining module (636). In an example, a determined (236)convergence status (66) is stored in, and read from, data structure(502) and/or elsewhere in memory (5) by, or at the direction of,processor(s) (3).

FIG. 7 is a flow chart of aspects of method (200) according toembodiments of the disclosure. Referring to FIGS. 1-7, in an embodiment,method (200) includes determining (239), by processor(s) (3), a graspsuccess probability value (29). In the embodiment, the grasp successprobability value (29) is determined (239) for each of the determined(221) first set (16) of candidate grasp locations (20) based on thefirst set (19) of grasp quality sensor (22) data (26) obtained (241) inthe evaluating (233) step. In an embodiment, the grasp successprobability value (29) is determined (239) in response to determining(236) an absence of the convergence (e.g., a negative convergence status(66)) of the actual and the simulated object (11) grasp quality for atleast one of the determined (221) first set (16) of candidate grasplocations (20). The determining (239) step thereby facilitates efficientuse of computing, memory, bandwidth, and/or power resources in caseswhere at least one, but not all, of the determined (221) first set (16)of candidate grasp locations (20) lack convergence (e.g., do not have apositive convergence status (66)). Thus, in the embodiment, a graspsuccess probability value (29) is determined (239) only for thosecandidate grasp locations (20) lacking convergence. In embodiments forwhich the determining (239) step is implemented and/or otherwisefacilitated by software (604), processor(s) (3) execute processor(3)-executable instructions stored in a determining module (639).

In an embodiment, method (200) includes assigning (242) and storing(245), by processor(s) (3) and in memory (5), respectively, graspsuccess probability values (32) respectively determined (239) for eachof the determined (221) first set (16) of candidate grasp locations(20). In embodiments for which the assigning (242) and/or storing (245)step(s) is/are implemented and/or otherwise facilitated by software(604), processor(s) (3) execute processor (3)-executable instructionsstored in assigning (642) and/or storing (645) module(s). In an example,determined (239) grasp success probability value(s) (29) and/or graspsuccess probability values (32) is/are stored in, and read from, datastructure (502) and/or elsewhere in memory (5) by, or at the directionof, processor(s) (3).

In an embodiment, the determining (239) step of method (200) includestransforming (248), by processor(s) (3), each of the first set (19) ofgrasp quality sensor (22) data (26) into a discrete value (35). Inembodiments for which the transforming (248) step is implemented and/orotherwise facilitated by software (604), processor(s) (3) executeprocessor (3)-executable instructions stored in a transforming module(648). In an example, discrete value(s) (35) resulting from thetransforming (248) step is/are stored in, and read from, data structure(502) and/or elsewhere in memory (5) by, or at the direction of,processor(s) (3). In the embodiment, the determining (239) step includesscoring (251), by processor(s) (3), each of the determined (221) firstset (16) of candidate grasp locations (20) based on respectivetransformed (248) discrete values (35). In the embodiment, a score value(38) for each of the determined (221) first set (16) of candidate grasplocations (20) is proportional to the determined (239) grasp successprobability value (29). In embodiments for which the scoring (251) stepis implemented and/or otherwise facilitated by software (604),processor(s) (3) execute processor (3)-executable instructions stored ina scoring module (651). In an example, score value(s) (38) resultingfrom the scoring (251) step is/are stored in, and read from, datastructure (502) and/or elsewhere in memory (5) by, or at the directionof, processor(s) (3).

In an embodiment, the transforming (248) step of method (200) includesdetermining (254) the discrete value (35) based on a plurality of graspquality sensor (22) readings (41) for each of the determined (221) firstset (16) of candidate grasp locations (20). In embodiments for which thedetermining (254) step is implemented and/or otherwise facilitated bysoftware (604), processor(s) (3) execute processor (3)-executableinstructions stored in a determining module (654). In an example, theplurality of grasp quality sensor (22) readings (41) used fordetermining (254) the discrete value are stored in, and read from, datastructure (502) and/or elsewhere in memory (5) by, or at the directionof, processor(s) (3).

In an embodiment, method (200) includes iterating (257), by processor(s)(3) and robot(s) (2), though the performing (209), evaluating (218),determining (221), performing (227), evaluating (233), and determining(236) steps for the at least one candidate grasp location (20). In theembodiment, the iterating (257) step is performing in method (200) forat least one iteration (266). In an example, the iterating (257) step isperformed in response to determining (236) the absence of theconvergence (e.g., negative convergence status (66)) of the actual andthe simulated object (11) grasp quality for the at least one determined(221) first set (16) of candidate grasp locations (20). The iterating(257) step thereby facilitates efficient use of computing, memory,bandwidth, and/or power resources in cases where at least one of thedetermined (221) first set (16) of candidate grasp locations (20) lackconvergence (e.g., do not have a positive convergence status (66)).Thus, in the embodiment, the performing (209), evaluating (218),determining (221), performing (227), evaluating (233), and determining(236) steps are iterated (257) through only for those candidate grasplocations (20) lacking convergence. The iterating (257) step thusdirects method (200) back to the simulation (e.g., at least a secondsimulation experiment (265)) and/or actual (e.g., at least a secondactual experiment (267)) experiments for retraining robot (2) grasping(215) for the at least one candidate grasp location (20), which ensuresthat the ML model (68) maintains accurate data for respective candidategrasp locations (20) for the object (11). In embodiments for which theiterating (257) step is implemented and/or otherwise facilitated bysoftware (604), processor(s) (3) execute processor (3)-executableinstructions stored in an iterating module (657).

In an embodiment, the iterating (257) step of method (200) includes, forthe at least one iteration (266), determining (260), by processor(s)(3), a grasp success probability distribution (44) for the determined(221) first set (16) of candidate grasp locations (20). In theembodiment, the grasp success probability distribution (44) isdetermined (260) based on the determined (239) grasp success probabilityvalues (29). In the embodiment, the iterating (257) step includes, forthe at least one iteration (266), imposing (263), by processor(s) (3),the determined (260) grasp success probability distribution (44) on atleast one second simulation experiment (265) for the robot(s) (2)grasping (215) the object (11) using the determined (221) first set (16)of candidate grasp locations (20). The determining (260) and imposing(263) steps thus facilitate retraining robot (2) grasping (215) of theobject (11) for the at least one candidate grasp location (20) byincorporating the grasp success probability distribution (44) into theML model (68), which increases its accuracy and robustness. Inembodiments for which the determining (260) and/or imposing (263)step(s) is/are implemented and/or otherwise facilitated by software(604), processor(s) (3) execute processor (3)-executable instructionsstored in determining (660) and/or imposing (663) module(s). In anexample, the determined (260) and imposed (263) grasp successprobability distribution (44) is stored in, and read from, datastructure (502) and/or elsewhere in memory (5) by, or at the directionof, processor(s) (3).

In an embodiment, the iterating (257) step of method (200) includes, forthe at least one iteration (266), determining (269), by processor(s)(3), at least a second set (47) of the plurality of assigned (203) grasplocations (8). In the embodiment, the at least a second set (47) of theplurality of assigned (203) grasp locations (8) is determined (269)based on the grasp success probability distribution (44) imposed (263)on the at least a second simulation experiment (265). In an example, theat least a second set (47) differs from the first set (14) by theaddition or removal of at least assigned (203) grasp location (8) on theobject (11). In the embodiment, the iterating (257) step includes, forthe at least one iteration (266), performing (272), by processor(s) (3),the at least a second simulation experiment (265) using the at least asecond set (47) of assigned (203) grasp locations (8). The determining(269) and performing (272) steps thus facilitate retraining robot (2)grasping (215) of the object (11) by performing subsequent simulationand actual experiments using the at least a second set (47) of theplurality of assigned (203) grasp locations (8) determined (269) basedon the imposed (263) grasp success probability distribution (44), whichrefines the assigned (203) grasp locations (8) and further increases theaccuracy and robustness of the ML model (68). In embodiments for whichthe determining (269) and/or performing (272) step(s) is/are implementedand/or otherwise facilitated by software (604), processor(s) (3) executeprocessor (3)-executable instructions stored in determining (669) and/orperforming (672) module(s). In an example, the determined (269) at leasta second set (47) of the plurality of assigned (203) grasp locations (8)is stored in, and read from, data structure (502) and/or elsewhere inmemory (5) by, or at the direction of, processor(s) (3).

In an embodiment, the iterating (257) step of method (200) includes, forthe at least one iteration (266), determining (275), by processor(s)(3), a maximum log likelihood of success (MLLS) (50) for each of the atleast a second set (47) of assigned (203) grasp locations (8). In theembodiment, the MLLS (50) is determined (275) based on the determined(260) grasp success probability distribution (44). In the embodiment,the iterating (257) step includes, for the at least one iteration (266),determining (278), by processor(s) (3), at least a second set (53) ofcandidate grasp locations (20). In the embodiment, the at least a secondset (53) of candidate grasp locations (20) are determined (278) based onrespectively determined (275) MLLS values (56) for each of the at leasta second set (47) of assigned (203) grasp locations (8). In an example,the at least a second set (53) differs from the first set (16) by theaddition or removal of at least one candidate grasp location (20). Thedetermining (275 and 278) steps thus facilitate retraining robot (2)grasping (215) of the object (11) by performing subsequent simulationand actual experiments using the at least a second set (53) of thecandidate grasp locations (20) determined (278) based on therespectively determined (275) MLLS values (56), which refines thecandidate grasp locations (20) and further increases the accuracy androbustness of the ML model (68). In embodiments for which thedetermining (275 and/or 278) step(s) is/are implemented and/or otherwisefacilitated by software (604), processor(s) (3) execute processor(3)-executable instructions stored in determining module(s) (675 and/or678). In an example, the determined (275) MLLS (50) and/or thedetermined (278) MLLS values (56) is/are stored in, and read from, datastructure (502) and/or elsewhere in memory (5) by, or at the directionof, processor(s) (3).

In an embodiment, the iterating (257) step of method (200) includes, forthe at least one iteration (266), evaluating (281), by processor(s) (3),a simulated grasp quality of the robot (2) grasping (215) for each ofthe at least a second set (47) of assigned (203) grasp locations (8) onthe object (11). In the embodiment, the simulated grasp quality of therobot (2) grasping (215) for each of the at least a second set (47) ofassigned (203) grasp locations (8) is evaluated (281) based on at leasta second set (59) of simulation data (24) from the at least a secondsimulation experiment (265). In the embodiment, the determining (275)step of method (200) includes determining (284) the MLLS (50) furtherbased on data (27) representative of the simulated grasp qualityobtained (277) by processor(s) (3) in the evaluating (281) step from theat least a second set (59) of simulation data (24) from the at least asecond simulation experiment (265). In embodiments for which theevaluating (281) and/or determining (284) step(s) is/are implementedand/or otherwise facilitated by software (604), processor(s) (3) executeprocessor (3)-executable instructions stored in evaluating (681) and/ordetermining (684) module(s). In an example, the at least a second set(59) of simulation data (24) from the at least a second simulationexperiment (265) and/or the data (27) representative of the simulatedgrasp quality obtained (277) by processor(s) (3) in the evaluating (281)step is/are stored in, and read from, data structure (502) and/orelsewhere in memory (5) by, or at the direction of, processor(s) (3).

In an embodiment, method (200) includes assigning (285) and storing(287), by processor(s) (3) and in memory (5), respectively, ahyper-parameter (65) of the ML model (68). As used herein,“hyper-parameter” means one or more settings that can be tuned tocontrol the behavior of a machine learning algorithm. Thehyper-parameter (65) is representative of a simulated grasp quality forthe at least a second simulation experiment (265) for the robot(s) (2)grasping (215) the object (11). In embodiments for which system (1)includes a plurality of object picking robots (71) (e.g., a first robot(2) and at least a second robot (2)), the method (200) further includessharing (290) the hyper-parameter (65) assigned (285) to a first robot(2) with the at least a second robot (2). The assigning (285) andsharing (290) steps thereby facilitate efficient use of computing,memory, bandwidth, and/or power resources in cases where a plurality ofrobots (71) are utilized in the run time ML (80) and/or training ML (74)environments. Thus, in the embodiment, only one robot (2) (e.g., thefirst robot (2)) of the plurality of robots (71) needs to be trainedand/or retrained and the ML model (68) developed for it according to thedisclosed method (200). As such, additional robot(s) (2) (e.g., the atleast a second robot (2) having the same or similar designspecifications and/or meeting the same or similar functionalrequirements as the first robot (2)) may not need to be trained and/orretrained for grasping (215) the object (11). In embodiments for whichthe assigning (285), storing (287), and/or sharing (290) step(s) is/areimplemented and/or otherwise facilitated by software (604), processor(s)(3) execute processor (3)-executable instructions stored in assigning(685), storing (687), and/or sharing (690) module(s). In an example, thehyper-parameter (65) assigned (285) to the ML model (68) is stored (287)in, and read from, data structure (502) and/or elsewhere in memory (5)by, or at the direction of, processor(s) (3).

In an embodiment, the performing (209), evaluating (218), determining(221), performing (227), evaluating (233), and determining (236) stepsof method (200) are performed in the ML training environment (74). Inthe embodiment, method (200) includes monitoring (293), by processor(s)(3), a run time grasp quality of the robot(s) (2) grasping (215) object(11) using the at least one candidate grasp location (20). In theembodiment, the run time grasp quality of the robot(s) (2) grasping(215) object (11) using the at least one candidate grasp location (20)is monitored (293) based on a run time set (77) of grasp quality sensor(22) data (26) obtained (295) from the robot(s) (2) grasping (215) theobject (11) in the ML run time (e.g., actual, “real world”) environment(80). In the embodiment, the monitoring (293) step is performed inresponse to determining (236) a presence of the convergence (e.g.,positive convergence status (66)) of the actual and the simulated object(11) grasp quality for at least one of the determined (221) first set(16) of candidate grasp locations (20). The monitoring (293) stepthereby facilitates efficient use of computing, memory, bandwidth,and/or power resources in cases where at least one of the determined(221) first set (16) of candidate grasp locations (20) have convergence(e.g., do not have a negative convergence status (66)). Thus, in theembodiment, only those candidate grasp locations (20) having theconvergence proceed to, and are monitored (293) in, the run time MLenvironment (80), which helps ensure the ML model (68) is maintainedwith accurate and up-to-date data for use in both the run time ML (80)and the training ML (74) environments. In embodiments for which themonitoring (293) step is implemented and/or otherwise facilitated bysoftware (604), processor(s) (3) execute processor (3)-executableinstructions stored in a monitoring (693) module. In an example, the runtime set (77) of grasp quality sensor (22) data (26) is stored (287) in,and read from, data structure (502) and/or elsewhere in memory (5) by,or at the direction of, processor(s) (3).

In an embodiment, the method (200) includes iterating (296), byprocessor(s) (3) and robot(s) (2), though the performing (209),evaluating (218), determining (221), performing (227), evaluating (233),and determining (236) steps for the at least one candidate grasplocation (20). In the embodiment, the iterating (296) step is performingin method (200) for at least one iteration (273). In an example, theiterating (296) step is performed in response to the monitored (293) runtime grasp quality decreasing below or otherwise not meeting a user(60)-predetermined quality threshold (83). In an example, the user-(60)predetermined quality threshold (83) is an occurrence (e.g., as sensedby camera sensor(s) (22)) of the object (11) falling out of the gripper(30). In another example, the user-(60) predetermined quality threshold(83) is an occurrence (e.g., as sensed by multi-degree of freedom (DOF)force-torque, multi-DOF inertial measurement, and/or tactile sensor(s)(22)) of the object (11) exhibiting wobble and/or vibration when beinggrasped (215) by and/or carried in the gripper (30), where such wobbleand/or vibration undesirably exceeds levels normally observed duringoperation of robot (2). In yet another example, the user-(60)predetermined quality threshold (83) is an occurrence (e.g., as sensedby sensor(s) (22)) of the object (11) exhibiting relative motion and/orphysical dynamics with respect to and/or as compared to the gripper (30)which is out of tolerance as compared to an expected or user(60)-specified amount of relative motion.

In the embodiment, the iterating (296) step thereby facilitatesefficient use of computing, memory, bandwidth, and/or power resources incases where, based on the monitored (293) run time grasp quality,optimization of the ML model (68) is needed to ensure consistent object(11) grasping (215) performance during run time. The monitoring (293)and iterating (296) steps, along with one or more of the above-describedsteps of method (200), together constitute a hybrid ML engine (51)facilitating continuous monitoring and evaluation of object (11) graspquality across the training (74) and run time (80) ML environments and,in applicable cases, across a plurality of robots (71). Furthermore, ina use case, monitored (293) run time grasp quality for at least onerobot (2) may not meet user (60)-predetermined quality threshold (83)due to a mechanical problem with that robot (2). In this use case, themechanical problem may be identified and rectified so that normal unitoperations may be resumed following retraining of the respective robot(2) as needed, including via the iterating (296) step of method (200).In embodiments for which the iterating (296) step is implemented and/orotherwise facilitated by software (604), processor(s) (3) executeprocessor (3)-executable instructions stored in an iterating (696)module. In an example, the user (60)-predetermined quality threshold(83) is stored (287) in, and read from, data structure (502) and/orelsewhere in memory (5) by, or at the direction of, processor(s) (3).

Using the disclosed systems and methods for hybrid ML-based training ofobject picking robots with real and simulated grasp performance dataprovides users a number of beneficial technical effects and realizesvarious advantages as compared to known systems and methods. Suchbenefits include, without limitation, enabling picking and placing ofobject(s) with enhanced accuracy, speed, efficiency, and reduced errorrates as compared to known processes. Utilizing the disclosed systemsand methods for hybrid ML-based training of object picking robotsthereby results in a reduction in the required number of per object CPUclock cycles needed for processor(s) in both training and run timeenvironments, as compared to known systems and methods. Theabove-described hybrid ML-based training of object picking robots enablecontinuous evaluation and monitoring of object grasping performance sothat the hybrid ML operations may be implemented for fine-tuning andenhancing the accuracy and robustness of ML models as needed, includingacross numerous robots involved in unit operations. As such, thedisclosed systems and methods for hybrid machine learning-based trainingof object picking robots enable efficient and effective training ofobject picking robots in a wide variety of industrial applications whereimproved utilization of computing, memory, network bandwidth, electricpower, and/or human personnel resources is desirable.

These and other substantial and numerous technical benefits andbeneficial effects appreciable to persons of ordinary skill in the artare especially evident as compared to known systems and methods inapplications involving high volume and high tempo industrial operations.These improvements over known systems and methods are not accomplishedby merely utilizing conventional and routine processing systems andmethods. Even in cases where such improvements may be quantified interms of per object time reductions (e.g., measured as seconds, orfractions thereof), over relevant time periods (e.g., from hours toyears) and as compared to known processes, the disclosed systems andmethods for hybrid machine learning-based training of object pickingrobots with real and simulated grasp performance data utilize computing,network, memory, electric power, personnel, among other, resources atsignificantly greater efficiencies to provide improved throughput of,and overall cost reduction for, a variety of industrial unit operationsinvolving robotic picking and placement of objects.

Various embodiments disclosed herein are to be taken in the illustrativeand explanatory sense, and should in no way be construed as limiting ofthe present disclosure.

While aspects of the present disclosure have been particularly shown anddescribed with reference to the embodiments above, it will be understoodby those skilled in the art that various additional embodiments may becontemplated by the modification of the disclosed devices, systems, andmethods without departing from the spirit and scope of what isdisclosed. Such embodiments should be understood to fall within thescope of the present disclosure as determined based upon the claims andany equivalents thereof.

We claim:
 1. A method for training an object picking robot with real andsimulated grasp performance data, comprising: (a) assigning, by aprocessor, a plurality of grasp locations on an object based on known orestimated physical properties of the object; (b) performing, by theprocessor, a first simulation experiment for the robot grasping theobject using a first set of the plurality of assigned grasp locations;(c) evaluating, by the processor and based on a first set of simulationdata from the first simulation experiment, a simulated object graspquality of the robot grasping for each of the first set of assignedgrasp locations; (d) determining, by the processor, a first set ofcandidate grasp locations on the object based on data representative ofa simulated grasp quality obtained in (c); (e) evaluating, by theprocessor and based on a first set of grasp quality sensor data from afirst actual experiment for the robot grasping the object using each ofthe first set of candidate grasp locations, an actual object graspquality of the robot grasping for the each of the first set of candidategrasp locations; and (f) for the each of the first set of candidategrasp locations, determining, by the processor, a convergence of theactual object grasp quality and the simulated object grasp quality basedon: the first set of grasp quality sensor data from the first actualexperiment, and the data representative of the simulated grasp quality.2. The method of claim 1, further comprising: (g) in response todetermining, in (f), an absence of the convergence of the actual and thesimulated object grasp quality for at least one of the first set ofcandidate grasp locations, determining, by the processor, a graspsuccess probability value for the each of the first set of candidategrasp locations based on the first set of grasp quality sensor dataobtained in (e).
 3. The method of claim 2, further comprising: (h)assigning, by the processor, grasp success probability valuesrespectively determined in (g) for the each of the first set ofcandidate grasp locations.
 4. The method of claim 2, wherein determiningthe grasp success probability value includes: (I) transforming, by theprocessor, each of the first set of grasp quality sensor data into adiscrete value; and (II) scoring, by the processor, the each of thefirst set of candidate grasp locations based on respective discretevalues from (I), wherein a score value for the each of the first set ofcandidate grasp locations is proportional to the grasp successprobability value determined in (g).
 5. The method of claim 4, whereinthe transforming includes determining the discrete value based on aplurality of grasp quality sensor readings for the each of the first setof candidate grasp locations.
 6. The method of claim 1, wherein (b)-(f)are performed in a machine learning training environment, and whereinthe method further comprises: (g) in response to determining, in (f), apresence of the convergence of the actual and the simulated object graspquality for at least one of the first set of candidate grasp locations,monitoring, by the processor, a run time grasp quality of the robotgrasping using the at least one candidate grasp location based on a runtime set of grasp quality sensor data obtained from the robot graspingin a machine learning run time computing environment.
 7. The method ofclaim 6, further comprising: (h) in response to the run time graspquality monitored in (g) decreasing below or otherwise not meeting auser-predetermined quality threshold, iterating, by the processor, andfor at least one iteration, through (b)-(f) for the at least onecandidate grasp location.
 8. A system for training an object pickingrobot with real and simulated grasp performance data, comprising: one ormore memory devices; and one or more processors in communication withthe one or more memory devices and the object picking robot, wherein theone or more processors are programmed to: (a) assign a plurality ofgrasp locations on an object based on known or estimated physicalproperties of the object; (b) perform a first simulation experiment forthe robot grasping the object using a first set of the plurality ofassigned grasp locations; (c) evaluate, based on a first set ofsimulation data from the first simulation experiment, a simulated objectgrasp quality of the robot grasping for each of the first set ofassigned grasp locations; (d) determine a first set of candidate grasplocations on the object based on data representative of a simulatedgrasp quality obtained in (c); (e) evaluate, based on a first set ofsensor data from a first actual experiment for the robot grasping theobject using each of the first set of candidate grasp locations, anactual object grasp quality of the robot grasping for the each of thefirst set of candidate grasp locations; and (f) for the each of thefirst set of candidate grasp locations, determine a convergence of theactual object grasp quality and the simulated object grasp quality basedon: the first set of grasp quality sensor data from the first actualexperiment, and the data representative of the simulated grasp quality.9. The system of claim 8, wherein the one or more processors are furtherprogrammed to: (g) in response to determining, in (f), an absence of theconvergence of the actual and the simulated object grasp quality for atleast one of the first set of candidate grasp locations, determine agrasp success probability value for the each of the first set ofcandidate grasp locations based on the first set of sensor data obtainedin (e).
 10. The system of claim 9, wherein the one or more processorsare further programmed to: (h) further in response to determining, in(f), the absence of the convergence of the actual and the simulatedobject grasp quality for the at least one of the first set of candidategrasp locations, iterate, for at least one iteration, through (b)-(f)for the at least one candidate grasp location.
 11. The system of claim10, wherein, for iterating through (b)-(f) for the at least onecandidate grasp location for the at least one iteration, the one or moreprocessors are further programmed to: (I) determine a grasp successprobability distribution for the first set of candidate grasp locationsbased on grasp success probability values determined in (g); and (II)impose the grasp success probability distribution determined in (I) onthe first set of candidate grasp locations to prepare at least a secondsimulation experiment for the robot grasping the object.
 12. The systemof claim 11, wherein, for iterating through (b)-(f) for the at least onecandidate grasp location for the at least one iteration, the one or moreprocessors are further programmed to: (III) determine at least a secondset of the plurality of assigned grasp locations based on the graspsuccess probability distribution imposed in (II) on the at least asecond simulation experiment; and (IV) perform the at least a secondsimulation experiment using the at least a second set of assigned grasplocations.
 13. The system of claim 12, wherein for iterating through(b)-(f) for the at least one candidate grasp location for the at leastone iteration, the one or more processors are further programmed to: (V)determine a maximum log likelihood of success for each of the at least asecond set of assigned grasp locations based on the grasp successprobability distribution determined in (I), the maximum log likelihoodof success including maximum log likelihood of success valves indicativeof likely success of the at least a second set of assigned grasplocations; and (VI) determine at least a second set of candidate grasplocations based on maximum log likelihood of success values respectivelydetermined in (V) for the each of the at least a second set of assignedgrasp locations.
 14. The system of claim 13, wherein for iteratingthrough (b)-(f) for the at least one candidate grasp location for the atleast one iteration, the one or more processors are further programmedto: (VII) evaluate, based on at least a second set of simulation datafrom the at least a second simulation experiment, a simulated graspquality of the robot grasping for the each of the at least a second setof assigned grasp locations on the object.
 15. The system of claim 14,wherein, for determining the maximum log likelihood of success, the oneor more processors are further programmed to: determine the maximum loglikelihood of success further based on data representative of thesimulated grasp quality obtained in (VII).
 16. The system of claim 10,wherein the one or more processors are further programmed to: (i) assigna hyper-parameter representative of a simulated grasp quality for atleast a second simulation experiment for the robot grasping the objectfor tuning the machine learning model.
 17. The system of claim 16,wherein, for training a plurality of object picking robots including afirst robot and at least a second robot, the one or more processors arefurther programmed to: (j) share the hyper-model parameter assigned in(i) to a first robot with at least a second robot.
 18. A non-transitorycomputer-readable storage medium storing processor-executableinstructions for training an object picking robot with real andsimulated grasp performance data, wherein, when executed by one or moreprocessors, the processor-executable instructions cause the one or moreprocessors to: (a) assign a plurality of grasp locations on an objectbased on known or estimated physical properties of the object; (b)perform a first simulation experiment for the robot grasping the objectusing a first set of the plurality of assigned grasp locations; (c)evaluate, based on a first set of simulation data from the firstsimulation experiment, a simulated object grasp quality of the robotgrasping for each of the first set of assigned grasp locations; (d)determine a first set of candidate grasp locations on the object basedon data representative of a simulated grasp quality obtained in (c); and(e) evaluate, based on a first set of grasp quality sensor data from afirst actual experiment for the robot grasping the object using each ofthe first set of candidate grasp locations, an actual object graspquality of the robot grasping for the each of the first set of candidategrasp locations, and (f) for the each of the first set of candidategrasp locations, determine a convergence of the actual object graspquality and the simulated object grasp quality based on: the first setof grasp quality sensor data from the first actual experiment, and thedata representative of the simulated grasp quality.
 19. Thenon-transitory computer-readable storage medium of claim 18, wherein,when executed by the one or more processors, the processor-executableinstructions further cause the one or more processors to: (g) inresponse to determining, in (f), an absence of the convergence of theactual and the simulated object grasp quality for at least one of thefirst set of candidate grasp locations, determine a grasp successprobability value for the each of the first set of candidate grasplocations based on the first set of grasp quality sensor data obtainedin (e); and (h) assign grasp success probability values respectivelydetermined in (g) for the each of the first set of candidate grasplocations.